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A GENERAL STATISTICAL FRAMEWORK FOR DISSECTING 
PARENT-OF-ORIGIN EFFECTS UNDERLYING ENDOSPERM 
TRAITS IN FLOWERING PLANTS 1 

By Gengxin Li and Yuehua Cui 

Michigan State University 

Genomic imprinting has been thought to play an important role 
in seed development in flowering plants. Seed in a flowering plant 
normally contains diploid embryo and triploid endosperm. Empiri- 
cal studies have shown that some economically important endosperm 
traits are genetically controlled by imprinted genes. However, the ex- 
act number and location of the imprinted genes are largely unknown 
due to the lack of efficient statistical mapping methods. Here we pro- 
pose a general statistical variance components framework by utilizing 
the natural information of sex-specific allelic sharing among sibpairs 
in line crosses, to map imprinted quantitative trait loci (iQTL) under- 
lying endosperm traits. We propose a new variance components par- 
tition method considering the unique characteristic of the triploid en- 
dosperm genome, and develop a restricted maximum likelihood esti- 
mation method in an interval scan for estimating and testing genome- 
wide iQTL effects. Cytoplasmic maternal effect which is thought to 
have primary influences on yield and grain quality is also considered 
when testing for genomic imprinting. Extension to multiple iQTL 
analysis is proposed. Asymptotic distribution of the likelihood ratio 
test for testing the variance components under irregular conditions 
are studied. Both simulation study and real data analysis indicate 
good performance and powerfulness of the developed approach. 

1. Introduction. The life cycle of an angiosperm starts with the process 
of double fertilization, where the fertilization of the haploid egg with one 
sperm cell forms the embryo, and the fusion of the two polar nuclei with 
another sperm cell develops into endosperm [Chaudhury et al. (2001)]. Thus, 
endosperm is a tissue unique to angiosperm. The embryo and endosperm are 
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genetically identical, except that the endosperm is triploid composed of one 
set of paternal and two identical sets of maternal chromosomes. In cereals, 
the endosperm of a grain is the major storage organ providing nutrition 
for early-stage seed development, and more than that, serves as the major 
source of food for human beings. The identification of important genes that 
underlie the variation of quantitative traits of various interests in endosperm 
is thus paramountry important. 

Genomic imprinting refers to the situation where the expression of the 
same genes is different depending on their parental origin [Pfifer (2000)]. It 
has been increasingly recognized that many endosperm traits are controlled 
by genomic imprinting. For example, endoreduplication is a commonly ob- 
served phenomenon which shows a maternally controlled parent-of-origin 
effect in maize endosperm [Dilkes et al. (2002)]. Cells undergoing endoredu- 
plication are typically larger than other cells, which consequently results 
in larger fruits or seeds beneficial to human beings [Grime and Mowforth 
(1982)]. Other reports of genomic imprinting with paternal imprinting in 
maize endosperm include, for instance, the r gene in the regulation of an- 
thocyanin [Kermicle (1970)], the seed storage protein regulatory gene dsrl 
[Chaudhuri and Messing (1994)], the ME A gene affecting seed development 
[Kinoshita et al. (1999)] and some a-tubulin genes [Lund, Messing and Viotti 
(1995)]. These studies underscore the value of developing statistical methods 
that empower geneticists to identify the distribution and effects of imprinted 
genes controlling endosperm traits. 

Statistical methods for mapping imprinted genes or imprinted quantita- 
tive trait loci (iQTL) have been extensively studied. Focusing on different ge- 
netic designs and different segregation populations, methods were developed 
in mapping iQTL underlying quantitative traits in controlled experimental 
crosses [e.g., Cui, Cheverud and Wu (2007); Cui et al. (2006); Wolf et al. 
(2008)], in outbred population [e.g., de Koning, Bovenhuis and van Arendonk 
(2002)] and in human population [e.g., Hanson et al. (2001); Shete, Zhou 
and Amos (2003)]. Broadly speaking, these methods can be categorized into 
two frameworks: one based on the fixed effect model where the iQTL effect 
is considered as fixed [e.g., Cui et al. (2006, 2007); de Koning, Bovenhuis and 
van Arendonk (2002)], and the other considering iQTL effect as random and 
estimating the genetic variances contributed by an iQTL [e.g., Hanson et al. 
(2001); Shete, Zhou and Amos (2003); Li and Cui (2009a)]. The method 
proposed by Li and Cui (2009a) extended the variance components model 
to experimental crosses and showed relative merits in mapping iQTLs with 
inbred lines. However, all these approaches for iQTL mapping were devel- 
oped based on diploid populations, whereby chromosomes are paired. Their 
applications are immediately limited when the ploidy level of the study pop- 
ulation is more than two, for instance, the triploid endosperm. 
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In this study we propose to extend our previous work in iQTL mapping 
with the variance components approach in experimental crosses [Li and Cui 
(2009a)], and consider the unique genetic makeup of the triploid endosperm 
genome to map iQTLs underlying triploid endosperm traits. Cytoplasmic 
maternal effects are also considered and adjusted when testing for genomic 
imprinting. Motivated by a real experiment, we propose a reciprocal back- 
cross design initiated with two inbred lines. The likelihood ratio test (LRT) 
is applied to test the significance of the variance components and its asymp- 
totic distribution is evaluated under irregular conditions. 

The article is organized as follows. Section 2 will illustrate the basic ge- 
netic design and the statistical mapping framework. We propose a new ap- 
proach for calculating the parental specific allelic sharing among inbreeding 
triploid sibs. Statistical hypothesis testings are proposed to assess iQTL 
effects. The limiting distribution of the LRT under the proposed mapping 
framework is studied. The multiple iQTL model is also proposed to separate 
closely linked (i)QTLs. Sections 3 and 4 will be devoted to simulations and 
real application followed by a general discussion in Section 5. 

2. Statistical method. 

2.1. The genetic design. Using experimental crosses for QTL mapping 
has been the traditional means in targeting genetic regions harboring po- 
tential genes responsible for quantitative trait variations. Toward the goal 
of mapping iQTL underlying endosperm traits in line crosses, we propose a 
reciprocal backcross design. A similar design was proposed by Li and Cui 
(2009a) for diploid mapping populations. In brief, two inbred parents with 
genotypes AA and aa are crossed to produce an Fi population {Ad). Fi in- 
dividuals are then backcrossed with one of the parents to generate backcross 
populations. We can use both parents as the maternal strain to cross with 
an Fi individual to generate two backcross segregation populations. Or we 
can use Fi individuals as the maternal strains to cross with both parents 
to produce another two sets of segregation populations. The so-called recip- 
rocal backcross design generates four different segregation populations with 
each one being considered as one family. Large number of backcross families 
can be obtained by simply replicating each one of the above crosses. 

To distinguish the allelic parental origin, we use subscript letters / and m 
to denote an allele inherited from the father and mother, respectively. A list 
of possible offspring genotypes considering the unique genetic makeups in 
the triploid endosperm genome is detailed in the second column in Table 1. 
Clearly, the endosperm genome carries one extra maternal copy due to the 
unique double fertilization step in flowering plants. When a dosage effect is 
considered, we do expect different expression values triggered by endosperm 
and embryo genes. 
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2.2. The model. In QTL mapping different line crosses can be combined 
together to increase the parameter inference space via a variance compo- 
nents method [Xie, Gessler and Xu (1998)]. VC method has been shown 
to be powerful in assessing genomic imprinting in human linkage analysis 
[Hanson et al. (2001)]. Recently, Li and Cui (2009a) extended the VC model 
to experimental crosses and proposed an iQTL mapping framework via com- 
bining different line crosses for iQTL detection. We extend our previous work 
to triploid endosperm tissue considering the unique genetic components in 
the endosperm genome. 

Suppose total K families are collected which are composed of the four 
distinct backcross families. Assume n& individuals are sampled in the feth 
family. The phenotypic variation of a quantitative trait in family k (denoted 
as Vk) can be explained by the genotype-specific cytoplasmic maternal ef- 
fect (denoted as additive QTL effect (denoted as a k ), polygene effect 
(denoted as g k ) and random residual effect (denoted as e^). To incorpo- 
rate the parent-of-origin effect, the additive QTL effect (a^) can be further 
partitioned into two separate effects, an effect due to the expression of the 
maternal allele (denoted as (ikm) and an effect due to the expression of the 
paternal allele (denoted as auf)- The model can thus be expressed as 

(2.1) yki = Vk + 2akmi + akfi + gki + eki, k = l,...,K;i = l,...,n k , 

where afc m j, a^/j, gki and e^i are random effects with normal distribution, 
that is, a fcmi ~7V(0,7Tj mjm | A .cr^), a k fi ~ iV(0,7r im/i/ | fc cr^), g u ~ N(Q, fa^g), 
&ki ~ A^(0, of); gki and e k i are uncorrelated to a^mi and a k fi', the coefficient 
2 for akmi adjusts for the effects of two identical maternal copies; Hk models 
the maternal genotype-specific effect; i^i m j m \ki n i/jf\k and 4>ij\ k are the IBD 
coefficients which are explained in the following section. With four distinct 
segregation populations, we have only three distinct maternal genotypes, 
A A, Aa and aa. Thus, the parameter Hk can be collapsed into three distinct 
values denoted as fj,±, fj,2 and fj,^ corresponding to maternal genotypes AA, 
Aa and aa, respectively. Letting ft = (fii, ^2, ^3), then model (2.1) can be 
rewritten in a vector form as 

(2.2) y fc = X fc /3 + 2a fcm + a fc/ + g fc + e fc , k = l,...,K, 

where X k is an n k X 3 matrix with one column of ones and two columns of 
zeros. 

2.3. Parent- specific allele sharing and the covariance between two inbreed- 
ing sibs. One of the major tasks in IBD-based iQTL mapping with the vari- 
ance components model is to calculate the IBD sharing probabilities and the 
phenotypic covariances between sibs. Such a method has been developed in 
the human population [Hanson et al. (2001)], which, however, cannot be ap- 
plied to a complete inbreeding population in experimental crosses, because 
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Fig. 1. Possible alleles shared IBD for individuals i and j in inbreeding backcross fami- 
lies. The solid lines indicate IBD sharing for alleles inherited from the same parent. The 
dotted lines indicate IBD cross-sharing for alleles inherited from different parents. 



the allelic sharing relationship among sibpairs does not follow the pattern 
as the one derived from a natural noninbreeding population. Instead, the 
IBD sharing probability can be calculated based on Malecot's coefficient of 
coancestry (1948) for an inbreeding population. Li and Cui (2009a) recently 
explored different allelic sharing patterns among sibpairs in a reciprocal 
backcross design with a diploid tissue. We extend the method to the triploid 
endosperm genome and derive covariances among sibpairs in a triploid tis- 



sue. 



Consider two individuals i and j randomly selected from one backcross 
family with phenotype yi and yj. Figure 1 shows all possible allelic sharing 
patterns between individuals i and j. The solid line indicates IBD sharing for 
alleles derived from the same parent and the dotted line indicates IBD cross- 
sharing for alleles derived from different parents. The allelic cross-sharing 
is unique to inbreeding populations, whereby this cross-sharing probability 
reduces to zero for noninbreeding populations. Here we propose to calculate 
the IBD sharing between individuals i and j (denoted as 7Ty) for a triploid 



genome as 



(2.3) 



7T; 



39 ij , 

4(5 + 3^0 



if i^h 
if i = j, 



where 9ij is Malecot's coefficient of coancestry and Fi is the inbreeding co- 
efficient [Harris (1964); Cockerham (1983); Lynch and Walsh (1998)]. By 
definition, 0{j is calculated as the probability of two randomly selected al- 
leles from individuals i and j being identical by descent. The calculation 
of iTij is different from the usual IBD sharing calculation in noninbreed- 
ing populations. It is instead interpreted as triple the Malecot coefficient of 
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coancestry [Xie, Gessler and Xu (1998)]. For easy notation, we still adopt 
the term "IBD sharing probability" for nij in the rest of the presentation. 
The calculation of the inbreeding coefficient follows the procedure given in 
Lynch and Walsh (1998). 

To illustrate the idea, consider two backcross individuals i (with genotype 
A m A m Af) and j (with genotype B m B m B f). The coefficient of coancestry 
dij between these two individuals can be expressed as 

Oij = l{Pr(A ml = B ml ) + Pr(A m i = B m2 ) + Pr(A m2 = B ml ) 

+ Vi{A m2 = B m2 ) + Pr(A mi = B f ) + Pr(^ m2 = B f ) 

+ Pr(A/ = B ml ) + Pr(A/ = B m2 ) + Pr(A f = B f )} 

= 9 i^imjm + 29i m j f + 10 i f j m + 9i p f ) , 

where the notation = refers to identical by decent; the subscript numbers 1 
and 2 indicate two maternally inherited alleles; Oi.j. is defined as the allelic 
kinship coefficient [Lynch and Walsh (1998)]. Note that the two terms Oi m j f 
and Oi f j m are indistinguishable, but their sum denoted as 6i m /j f (= ®i m j f + 
ifjm ) is unique. Thus, we have % = jj(40» m j m + 2S i m /j f + ^/i/)- Following 
equation (2.3), we have 

Tij = 3dij = ^Oi m j m + \9i m /j f + \ ft ip s = Ki m j m + TTi m /j f + ^ifjf f° r i 7^ 3- 

It can be seen that the IBD sharing between any two individuals can be 
decomposed as three separate components, one due to the IBD sharing for 
alleles derived from the maternal parent {%i m j m = ^Oi m j m ), one due to the 
cross-sharing for alleles derived from different parents (^i m /j f = ^@i m /j f ) 
and one due to the IBD sharing for alleles derived from the paternal parent 
(ftifjt = 3@ifjf)- An exhaustive list of all possible IBD sharing probabilities 
for the four backcross families is given in Table 1. 

Dropping the family index k, the covariance between any two individuals 
i and j can be expressed as 

Cov(V»Vjki m j m ,'Ki m /3f,'Kifjf) 

= Cov(2a mi + a,fi +gi + e i: 2a m j + dfj + gj + ej) 

= 47r Lj m Vm + 27r i m /j f a rnf + Kifjftf + fo^g + » 

where 7r{ mim = \{^i m j m ) and n' im/jf = |(vr im/j/ ) are the IBD sharing and 

cross-sharing probabilities by considering one single maternal allele; 
measures the variation of trait distribution due to alleles cross-sharing; <f>ij 
is the expected alleles shared IBD; ijj is an indicator variable taking value 
1 if i = j and if i ^ j. For a natural population without inbreeding, there 
is no allele cross-sharing for an individual with itself, hence, ~^i m /j f = 0. For 



8 



G. LI AND Y. CUI 



a diploid noninbreeding population, the trait covariance can be simplified 
as the one given in Shete, Zhou and Amos (2003). In matrix form, the 
phenotypic variance-covariance for individuals in the kth backcross family 
can then be expressed as 

. \ ^ 2 2 2 2 2 

(2.4) S fe = H m \ k a m + H m /f\ k a mf + U.f\ k a f + ® g \kV g + > 
where the elements of n m i^, II/iA; and Tl m /fi k can be found in Table 1. 

2.4. QTL IBD sharing and genome-wide linkage scan. The above de- 
scribed IBD sharing probability is calculated at a known marker position. 
Unless markers are dense enough, we have to search across the genome for 
potential (i)QTL positions and their effects. In general, the QTL position 
can be viewed as a fixed parameter by searching for a putative QTL at every 
1 or 2 cM on a map interval bracketed by two markers throughout the en- 
tire linkage map. Thus, we need to estimate the QTL IBD sharing at every 
scan position. Since the conditional probability of an endosperm QTL given 
upon two flanking markers is the same as the one derived from a diploid 
genome [Cui and Wu (2005)], the same procedure termed as the expected 
conditional IBD sharing described in Li and Cui (2009a) can be applied to 
calculate the QTL IBD sharing probability at every scan position. 

Assuming multivariate normality of the trait distribution for data in each 
family and assuming independence between families, the joint log-likelihood 
function when K backcross families are sampled can be formulated as 

K 

(2.5) £ = ^log[/(y fc ;/x fc ,S fe )], 

k=l 

where / is the multivariate normal density. Parameters to be estimated 
include /3 = (/ii, ^2, ^3) and Q, = (a^a^a^pO^^a^). Two commonly used 
methods in linkage analysis, the maximum likelihood (ML) method and the 
restricted maximum likelihood (REML) method, may be applied to estimate 
parameters. It is commonly recognized that the REML method gives less bi- 
ased estimation compared to the ML method [Corbeil and Searle (1976)]. 
Here we adopt the REML method with the Fisher scoring algorithm to 
obtain the REML estimates [see Li and Cui (2009a) for details of the algo- 
rithm] . 

The conditional QTL IBD-sharing values vary at different testing posi- 
tions. The amount of support for a QTL at a particular map position can 
be displayed graphically through the use of likelihood ratio profiles, which 
reflect the variation of the testing position of putative QTLs. The signifi- 
cant QTLs are detected by the peaks of the profile plot that pass a certain 
significant threshold (see Section 2.5 for more details). 
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2.5. Hypothesis testing. In iQTL mapping, we are interested in testing 
whether there is any significant genetic effect at a test position and would 
like to further quantify the imprinting effect if any. The hypothesis for testing 
the existence of a QTL can be expressed as 

(2.6) { H ° 1 a ™ = °? = a ™f = °' 

[ Hi : at least one parameter is not zero. 

The LRT is applied for this purpose. Define £1 and f2 to be the estimates of 
the unknown parameters under Hq and Hi, respectively. The LRT statistic 
can be calculated as 

(2.7) LR = -2|logL(fi|y)-logL(fi|y)]. 

Let = (m ii 2 Ai3 Oi 6 2 9 3 6» 4 6 5 ) T = (m fi 2 fJ, 3 <?m o~ 2 f a 2 mf a 2 a 2 ) T € fi = 
R 3 x [0, oo) x [0, oo) x [0, oo) x (0, oo) x (0, oo) be the parameters to be es- 
timated. Note that the polygene variance is bounded away from zero if we 
assume there are more than one QTL in the genome. Let the true parameters 
under the null hypothesis be 6 = (fi w p 20 ^30 4 0/ o o- 2 mf() a 2 Q a 2 Q ) T = 
(A*io M20 Aiso a 2 a 2 ) T e Q = R 3 x {0} x {0} x {0} x (0, oo) x (0, oo) . 
The three tested genetic variance components under the null hypothesis lie 
on the boundaries of the parameter space ft. Thus, the standard conditions 
for obtaining the asymptotic x 2 distribution of the LRT are not satisfied 
[Self and Liang (1987)]. Following the results from Chernoff (1954), Shapiro 
(1985) and Self and Liang (1987), the following theorem states that the 
LR statistic follows a mixture chi-square distribution, whereby the mixture 
proportions depend on the estimated Fisher information matrix. 

Theorem 2.1. Let Cq and Cci be closed convex cones with vertex at 
6q to approximate £Iq and f2, respectively. Let Y be a random variable with 
a multivariate normal distribution with mean Oq, and variance-covariance 
matrix I~~ 1 (Oo). Under the assumptions given in the Appendix, the LR statis- 
tic in (2.7) is asymptotically distributed as a mixture chi-square distribu- 
tion with the form W3X3 • U 2X 2 : w iXi : w oXo> where W3 = [2tt — cos -1 pi 2 — 

COS _1 pi3 - COS~ 1 p23], ^2 = 4^[3tT - COS" 1 Pl2|3 ~ COS" 1 />13|2 ~ COS -1 />23|lL 

^i = ^:(cos~ 1 pi2 + cos" 1 pi3 + cos~ 1 p23), and w = ^-^pTr-cos -1 ^^- 
cos _1 pi3|2 — cos -1 P23|i]> Pab is the correlation between the variance terms a 
and b calculated from the Fisher information matrix, and p a b\c = 

{Pab-Pacpbc) 
(l-p2 c )l/2(l- p 2 c )l/2 • 

Note that the symbol ir in the above theorem is the irrational number (a 
mathematical constant) not the IBD sharing probability. The proof of the 
theorem is given in the Appendix. 
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Remark. When the random parameter estimators are uncorrelated or 
the correlation is extremely small, that is, the Fisher information matrix is 
close to diagonal, the mixture proportions for the x\ components are reduced 
to the binomial form with (^)2~ 3 , which is consistent with the result (Case 
9) given in Self and Liang (1987). 

Once a QTL is identified at a genomic position, we can further assess its 
imprinting property. To evaluate whether a QTL shows imprinting effect, 
the hypotheses can be formulated as 



(2. 



Ho'.a'j — cr^, 



H x :aj^a 2 



Again, the likelihood ratio test can be applied which asymptotically follows 
a x 2 distribution with 1 degree of freedom since the tested parameter under 
the null is nonnegative and does not lie on the boundary of the parameter 
space. Rejecting Hq indicates genomic imprinting, and the QTL can be called 
an iQTL. We denote this imprinting test as LRi mp . If the null is rejected, 
one would be interested in testing whether the detected iQTL is completely 
maternally or paternally imprinted with the corresponding null hypothesis 
expressed as Hq : = and Hq:<j^= 0, respectively. The LRT statistic for 
the two tests asymptotically follows a mixture x 2 distribution with the form 
2X0 • 2^1" Rejection of complete imprinting indicates partial imprinting. 

Maternal effects can be tested by formulating hypothesis: Hq : fj,\ = n% = 
/13. Note that these three parameters do not represent the true maternal 
effects, as they are confounded with the main genetic effects. But a test of 
pairwise differences can be applied to detect the significance of any maternal 
contribution. 



2.6. Multiple iQTL model. In practice, there may be several QTLs to 
reflect the phenotypic variation in the whole genome. When testing QTL 
effects at one chromosome, effects from QTLs located at other chromosomes 
are absorbed by the polygenic effect (g). In some cases, two or more QTLs 
may be located at the same chromosome, which are termed as background 
QTL(s) in comparison to the tested one. When this happens, it is essential to 
adjust for the background QTL(s)' effects. Otherwise, it may lead to biased 
estimation for the putative QTL caused by the interference of QTL(s) close 
to the tested interval [Zeng (1994)]. 

In the previous work of Li and Cui (2009a), the authors proposed a mul- 
tiple iQTL model following the idea of next-to-flanking markers proposed 
by Xu and Atchley (1995). We adopted a similar strategy in the current 
study. Briefly, assuming there are S (i)QTLs in one chromosome, the mul- 
tiple iQTL model considering parent-specific allele effect can be expressed 
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as 

S S 

Uki = Vk + la-kmis + afc/is + 9ki + efci, A; = 1, . . . ,K; i = 1, . . . , n.*., 

s=l s=l 

where each (i)QTL effect is partitioned as two separate terms to reflect 
the contribution of the maternal and paternal alleles. In reality, the exact 
number and location of QTLs in a chromosome is generally unknown before 
doing a genome-wide search. This problem can be eased by applying the 
next-to-flanking markers idea proposed by Xu and Atchley (1995). 

Denote a test interval with two flanking markers as Mi~M r . The markers 
next to these two markers are denoted as Ml on the left of Mi, and Mr on 
the right of M r (L = 1 — 1 and R = r + 1). Conditional on the two markers, 
Ml and Mr, we expect the effects of QTL(s) located outside of the tested 
interval can be absorbed by the IBD values calculated from the two next-to- 
flanking markers [Xu and Atchley (1995)]. Thus, the calculation of (i)QTL 
covariance conditional on these two markers will avoid the requirement for 
the position of QTLs outside of the tested interval. Dropping the family 
index, the phenotypic covariance between two individuals i and j can be 
expressed as 

Cov(yi,yj |vr L , vr imim , n im / jf , TT ifjf , ttr) 
l 

1=1 

R 

+ ^2 K(6ir, ir R )al + (j)ija 2 g + Iijal 

r=l 

2 2 2 2 2 2 2 

= K L (Jl + Ki m j m CTm + ^i m l3i a mS + ^ifif Cr f + 7T R a R + 4>ij° g + kj<? e , 

where ~kl is the IBD sharing value at marker L, and a L is a composite 
variance component which reflects the variation of (i)QTL effects on the left 
side of the tested interval [see Li and Cui (2009a) for details], -kr and a\ are 
defined similarly. The calculations of ttl and ttr reflect the triploid structure 
of the endosperm genome. Testing (i)QTL effects can then be focused on a 
tested interval while adjusting for the background QTLs' effects located in 
another place. 

3. Simulation. Simulation studies are conducted to investigate the 
method performance. We assume a fixed total sample size of 400, then vary 
the family and offspring size with different combinations, that is, 4 x 100, 
8 x 50, 20 x 20 and 100 x 4, in order to evaluate the effect of family and 
offspring size on testing power and parameter estimation. Simulation details 
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are given in the Simulation and real data analysis. Here we briefly summa- 
rize the main results. 

3.1. Single iQTL simulation. For the single iQTL simulation, the results 
show that both the 4 x 100 and the 100 x 4 designs yield lower QTL detec- 
tion power and higher RMSE (root mean squared error) for QTL position 
estimation than the other two designs do. The 20 x 20 design slightly beats 
the 8 x 50 design with smaller imprinting type I error and higher QTL detec- 
tion power. These results indicate that it is necessary to maintain a balance 
between the family size and the offspring size, in order to achieve optimal 
power and good effects estimation precision. For a given budget with a fixed 
total sample size, one should always try to avoid extreme designs with a 
large (or small) number of families, each with a small (or large) number of 
offsprings. 

Focusing on the 20 x 20 design, simulations are performed to show the 
model behavior under different imprinting modes, that is, complete pater- 
nal imprinting, complete maternal imprinting, partial maternal imprinting 
and partial paternal imprinting. The results indicate that the power to de- 
tect imprinting depends on the underlying degree of imprinting. Relatively 
higher imprinting power is observed when an iQTL is maternally imprinting 
compared to the case when an iQTL is paternally imprinting. 

3.2. Multiple iQTL simulation. In this simulation data are simulated by 
assuming two (i)QTLs located at two genomic positions and are subject 
to both the single iQTL and multiple iQTL analyses. The results indicate 
a clear benefit of analysis by fitting a multiple iQTL model rather than 
fitting a single iQTL model. While the single iQTL analysis detects one 
"ghost" QTL located between the two simulated QTLs, the multiple iQTL 
analysis can clearly separate the two QTLs with high precision. Note that 
the multiple iQTL analysis normally generates lower LR values than the 
single iQTL analysis does. Note that the distribution of the LR value under 
the multiple iQTL analysis is not clear, and permutation should be applied 
to assess significance of any (i)QTLs in multiple iQTL analysis [Xu and 
Atchley (1995)]. 

4. A case study. We apply our method to a real data set which has two 
endosperm traits of interests: mean ploidy level (denoted as Mploidy) and 
percentage of endoreduplicated nuclei (denoted as Endo) . The two traits de- 
scribe the level of endoreduplication in maize endosperm, which is thought to 
be genetically controlled by imprinted genes [Dilkes et al. (2002)]. Four back- 
cross (BC) segregation populations, initiated with two inbred lines, Sgl8 and 
Mol7, were sampled. The four BC populations were obtained following the 
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Fig. 2. The profile of the log-likelihood ratios (LR) for testing the existence of QTLs un- 
derlying the two endosperm traits across the 10 maize linkage groups (G\, . . . ,Gio). The 
genome-wide LR profiles for the percentage of endoreduplication (Endo) and mean ploidy 
(Mploidy) traits are indicated by solid and dotted curves, respectively. The threshold val- 
ues for claiming the existence of QTLs are given as the horizonal solid and dotted line for 
the genome-wide threshold, and the dashed and dash-dotted line for the chromosome-wide 
threshold, for the two traits Endo and Mploidy, respectively. The genomic positions corre- 
sponding to the peak of the curves that pass the corresponding thresholds are the MLEs of 
the QTL location. The positions of markers on the linkage groups [Coelho et al. (2007)] 
are indicated at ticks. 



design illustrated in Table 1. The data show a large degree of variation for en- 
doreduplication among the four BC populations, and ten linkage groups were 
constructed from the observed marker data [Coelho et al. (2007)]. Readers 
are referred to Coelho et al. (2007) for more details about the data. The two 
traits are analyzed with our multiple iQTL model aimed to identify iQTLs 
across the ten linkage groups. The data are also analyzed with a Mendelian 
model. Results from both imprinting and Mendelian models are compared 
and summarized in the Supplementary Materials. 

Figure 2 plots the LR values across the ten linkage groups for the two 
traits. The solid and dotted curves represent LR profiles for traits Endo 
and Mploidy, respectively. To adjust for the genome-wide error rate across 
the entire linkage group, permutation tests are applied in which the critical 
threshold value is empirically calculated on the basis of repeatedly shuf- 
fling the relationships between marker genotypes and phenotypes within 
each BC family [Churchill and Doerge (1994)]. The corresponding genome- 
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wide significance thresholds (at 5% level) for the two traits are denoted by 
the horizontal solid (for Endo) and dotted (for Mploidy) lines. The 5% level 
chromosome-wide thresholds are denoted by the dashed (for Endo) and dash- 
dotted (for Mploidy) lines. QTLs that are significant at the chromosome- 
wide level are called suggestive QTLs. It can be seen that two QTLs (on 
G7 and G9) associated with Mploidy and one QTL (on G6) associated with 
Endo are detected at the 5% genome- wide significance level (denoted by "*" 
in Table 2). Two suggestive QTLs (on G2 and G10) associated with Endo 
and one suggestive QTL (on G6) associated with Mploidy are also identified. 
The detailed QTL location and effect estimates as well as the test results 
for imprinting are tabulated in Table 2. For the trait Mploidy, the identified 
three QTLs are all imprinted (pimp < 0.05) and all show completely mater- 
nal imprinting, that is, the maternal copy does not express. They are thus 
termed iQTLs. The cytoplasmic maternal effect does not show any evidence 
of significance for all the three iQTLs (pm > 0.05). For the trait Endo, only 
the QTL detected on G6 shows imprinting effect (pi mp < 0.05) and it shows 
completely paternal imprinting (pf < 0.05). The other two QTLs do not 
show evidence of imprinting (pi mp > 0.05). For this trait, significant mater- 
nal effects are detected (pm < 0.01). 

In our study, one maternally controlled iQTL is detected for trait Endo, 
which is consistent with the result given by Dilkes et al. (2002). Mean- 
while, according to the genetic conflict theory proposed by Haig and West- 
oby (1991), maternally derived alleles tend to trigger a negative effect on 
the increase of endosperm growth, whereas paternally derived alleles tend to 
play an opposite effect to increase seed size. The identified iQTLs showing 
maternal imprinting for trait Mploidy can be well explained by the genetic 
conflict theory. Both empirical evidence and theoretical hypothesis support 
the current finding. 

5. Discussion. The role of genomic imprinting in endosperm develop- 
ment has been commonly recognized [Dilkes et al. (2002); Kinoshita et al. 
(1999); Chaudhuri and Messing (1994)]. But little is known about the exact 
location and effect size of imprinted genes in endosperm. As endosperm in 
cereal provides the most nutrition for human beings, it is important to iden- 
tify imprinted genes that govern seed development, particularly endosperm 
development. In this article we develop a variance components linkage anal- 
ysis method with an experimental cross design, aimed to identify iQTLs in 
endosperm. Our method is motivated by real applications and is evaluated 
through Monte Carlo simulations. 

The proposed method is based on a particular genetic design (reciprocal 
BC design) with inbreeding populations. We treat iQTL effects as random, 
different from a fixed-effect iQTL model [e.g., Cui (2007)]. Variance compo- 
nents linkage analysis with a partial inbreeding human population was pre- 
viously proposed [see Abney, McPeek and Ober (2000)]. However, extending 



Table 2 

The estimated parameters for the three maternal effects and the variance components for two endosperm traits: mean ploidy (Mploidy) 

and percent of the endoreduplicated nuclei (Endo) 
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0.045 
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0.92 
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0.02 
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50.67 


0.58 


0.03 
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0.17 
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The three QTLs for trait Mploidy are located at marker umcl805, marker dupssr9 and umcl040 + 5.76cM on chromosome 6, 7 and 9, 
respectively. The three QTLs for trait Endo are located at marker umc2094, bnlg345 + 33.49cM and MMC501 + 18cM on chromosome 
2, 6 and 10, respectively. QTLs showing significance at the genome- wide significance level are indicated by "„". pm, Pimp, Pm and p/ are 
the p-values for testing maternal effect (Ho : fJ-i — /Lt2 = ^3), imprinting effect (Ho : = o^), complete maternal imprinting (Ho : a m — 0) 
and complete paternal (Ho : a 2 j = 0), respectively. 
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the VC model to a completely inbreeding population is challenging. In our 
previous work, we proposed a VC-based iQTL mapping framework for an 
inbreeding diploid mapping population [Li and Cui (2009a)]. Extending the 
previous work, we propose a novel IBD partitioning approach to calculate 
allelic sharing in an inbreeding endosperm population. Extension to map- 
ping multiple iQTLs is provided. Simulations indicate good performance of 
the multiple iQTL analysis compared to a single iQTL model. Meanwhile, 
to obtain a good balance of iQTL position and effect estimation as well as 
detection power, we have to avoid extreme sample designs. For a fixed total 
sample size, extremely large or small families should be always avoided. 

In an application to two endosperm traits, we identified three iQTLs for 
trait Mploidy. All show paternal expression. We also identified one iQTL for 
trait Endo, which shows a maternal expression. According to the parental 
conflict theory proposed by Haig and Westoby (1991), maternally derived 
alleles trigger a negative effect on endosperm cell growth and inhibit en- 
dosperm development because the extra maternal copy could slower nuclear 
division in endosperm. On the contrary, paternally derived alleles tend to 
increase seed size. Thus, the three iQTLs identified for Mploidy can be ex- 
plained by the genetic conflict theory. The occurrence of parental conflict 
theory explains parent-of-origin effects as an ubiquitous mechanism for the 
control of early seed development [Grossniklaus et al. (2001); Kinoshita et 
al. (1999)]. 

In VC-based linkage analyses, likelihood ratio test (LRT) has been com- 
monly applied in assessing QTL significance. The LRT statistic asymptot- 
ically follows a mixture \ 2 distribution with binomial mixture coefficients, 
as many investigators often claimed [following Case 9 in Self and Liang 
(1987)]. In a recent investigation, we found that the LRT in a regular VC- 
based linkage analysis without considering imprinting follows a mixture x 2 
distribution with mixture proportions depending on the estimated Fisher 
information matrix [Li and Cui (2009b)]. The modified calculation of mix- 
ture proportion does give more reasonable type I error rate than the one 
with binomial coefficients. When imprinting is considered, we show that the 
limiting distribution of the LRT also follows a mixture x 2 distribution, and 
we adopt the new criterion for power evaluation. Simulations show that the 
new criterion gives type I error closer to the nominal level than the one using 
binomial coefficients, and also produces power as good as the later one (data 
not shown). We recommend investigators adopt the new criterion in their 
analysis. 

Increasing evidence has suggested that for correlated traits, multivariate 
approaches can increase the power and precision to identify genetic effects 
in genetic linkage analyses [e.g., Boomsma and Dolan (1998); Amos and An- 
drade (2001); Evans (2002)]. Also, the joint analysis of multivariate traits 
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can provide a platform for testing a number of biologically interesting hy- 
potheses, such as testing pleiotropic effects of QTL and testing pleiotropic vs 
close linkage. Moreover, if the putative QTL has pleiotropic effects on sev- 
eral traits, the joint analysis may perform better than mapping each trait 
separately [Jiang and Zeng (1995)]. Multivariate traits appear frequently in 
genetic mapping studies. For example, the two endosperm traits evaluated 
in this study are highly correlated [Coelho et al. (2007)]. We expect joint 
analysis may provide high mapping resolution and power for iQTL detection. 
This will be explored in our future investigation. A computer code written 
in R for implementing the current analysis is available upon request. 

APPENDIX 

In standard human linkage analysis with a variance components model, 
many authors declare that the likelihood ratio statistic follows a mixture 
X 2 distribution with binomial coefficient for each mixture component [e.g., 
Amos and Andrade (2001); Hanson et al. (2001); Shete, Zhou and Amos 
(2003)]. Following Chernoff (1954), Shapiro (1985) and Self and Liang (1987), 
in the following we show that the mixture proportion actually depends on 
the estimated Fisher information matrix. 

For a random sample X with density function f(pc;0), following Chernoff 
(1954) and Self and Liang (1987), assume that: 

(i) For any true parameter 6q, the neighborhood of 6q is closed and the 
intersection between this closure and O, defined in the main text is also a 
closed set. 

(ii) The first three derivatives of log f(xi\6) with respect to 6 on the 
intersection of the neighborhood of Oq and fi almost surely exist. Moreover, 
| gg^g°o/ \ < W (x) for all 9 on the intersection, and E[W(x)} < oo. 

(iii) The information matrix 1(9) is positive definite on neighborhoods 

of e . 

(vi) The set O is convex. 

Assuming the above assumptions, the consistency, weak convergence and 
asymptotic normality of the estimators can be established [see Chernoff 
(1954); Self and Liang (1987); Shapiro (1985)]. Here we cite the main results 
from Chernoff (1954), Shapiro (1985) and Self and Liang (1987) to show the 
asymptotic distribution of the LRT in our case. 

Defining two closed polyhedral convex cones Cq and Cq 1 to approximate 
f2o and f^i at 6q, the parameter space under the null hypothesis is approx- 
imated as Cn = {0:0 € M 3 x {0} x {0} x {0} x (0,oo) x (0,oo)}, against 
C Ul = {6 : 6 € M 3 x [0, oo) x [0, oo) x [0, oo) x (0, oo) x (0, oo)} under the al- 
ternative. Let Y' be a random variable generated from the multivariate nor- 
mal distribution, that is, Y' ~ N(0 , 7 _1 (0 O ))- Following Chernoff [(1954), 
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Theorem 1], the asymptotic distribution of the LRT in (2.7) is equivalent to 
the following quadratic approximation: 

(Al) LR*= inf (Y'-6»)7(0 O )(Y / - 0) - inf (Y' - 0)7(0 O )(Y' - 0). 

Subtracting 6q from Y' and 0, the expression in (Al) is given by 

LR*= inf (Y-0)7(0 o )(Y-0)- inf (Y - 0)7(0 O )(Y - 0), 

0eCn o -e o eec ni -0o 

(A2) 

where Y = Y' — 6q ~ N(0, 1 1 (0o)) under the linear transformation. 

Let C7* = (C ni - 0o) n (Cn - O ) C = {0 : Ox > 0, # 2 > 0, # 3 > 0}, which is a 
closed polyhedral convex cone with 3 dimensions. By the Pythagoras theo- 
rem, the statistic in (A2) can be expressed as 

(A3) LR* = inf ( Y - 0)7(0 O ) ( Y - 0) . 

flee* 

Let F(C X ) be the set of all faces of C*. C*° = {76 M 3 : 7 r < 0, V0 G C*} is 
defined to be a polar cone such that (C* ) = C*. Following Shapiro (1985), 
we can select a face f G ,F(C*) corresponding to the polar face u° G .F(C* ) 
such that the linear spaces generated by v and u° are orthogonal to each 
other. For one face v (or z/°), a projection (or 7]/)) [a symmetric idem- 
potent matrix giving projection onto the space generated by v (or z/ )] can 
be found such that T v = I — T UQ since they are orthogonal. Then T v Y (or 
T„oY) is a projection of Y onto (or C*°). 

For a given Y, let g(Y) be the minimizer to achieve the infimum in (A3). 
Define tp u \ Y = {Y G K 3 : g(Y) G u) so that s(Y) G v if and only if T V Y G C* 
and T^oY G C* . By Shapiro (1985), g{Y) = T V Y G C*, VY G ^|Y- 

Note that the set VV|Y is composed of 2 3 disjoint sets in M 3 . All these 
disjoint sets can be classified into four categories as follows: 

(1) = {Y; Yx > 0, Y 2 > 0, Y 3 > 0, g(Y) G v}, 

(2) = {Y;Yx > 0,Y 2 > 0,Y 3 < 0,g(Y) G ^}; ^ 3 |Y = {Y; Y x > 0,Y 2 < 
0, Y 3 > 0,g(Y) £4;^ |Y = {Y; Yi < 0, Y 2 > 0, Y 3 > 0,g(Y) G 1/}, 

(3) = {Y;^ < 0,Y 2 < 0,Y 3 > 0,g(Y) G ^}; ^ |Y = {Y;Yx > 0,Y 2 < 
0, y 3 < 0, g(Y) G ^}; Vl| Y = {Y; Yx < 0, Y 2 > 0, Y 3 < 0, g(Y) G v}, 

(4) ^ |Y = {Y; Yx < 0, y 2 < 0, Y 3 < 0, g(Y) G u}. 

By linear transformation, we cab define C* = {G* : 0* = A 1 / 2 P'0, V0 G C*} 
which is a polyhedral closed convex cone. Then (A3) can be further expressed 
as 

(A4) LR* = inf ||z-0*|| 2 , 

e*ec* 
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where z = A 1//2 i-*'Y [PAP T = I(6q)] has a multivariate normal distribution 
with mean and identity covariance matrix. 

Let C*° be a polar cone of C* and (C*°)° = C* . Two faces v* and 
u*° can be defined with respect to J-(C*) and J-(C*°). The relevant or- 
thogonal projections T v * and T v *a corresponding to v* and v*° can be de- 
fined. Suppose h(z) is the minimizer to achieve the infimum in (A4). Fol- 
lowing Shapiro (1985), a set VV|z can be defined similarly as tp u \Y, such 
that h(z) = T v *z G C*, Vz G Vi>*|z- It satisfies the conditions of Lemma 3.1 
[Shapiro (1985)]. Then we have 

LR* = \\z-h(z)\\ 2 = \\z-T u *zf = z'(I-T u *)z = z'T v » z Vz G W*|z- 

(A5) 

Thus, the distribution of LR* in (A3) can be evaluated by 
Pr {LR* > (?) 

= Pr ^(Y - <7(Y))'I(0 O )(Y - g(Y)) > c\ Y G U#|y) 

2 3 

= £ Pr(Y G ^| Y ) Pr (( Y " S(Y))7(0 O )(Y - 5 (Y)) > c 2 |Y G ^ |Y ) 
j=i 

2 3 

= ]TPr(Y G Vt| Y ) Pr(z'l>z > c 2 |z G <.| z ), 
i=l 

where, conditional on z G V^*| z i z'T v »az is a chi-square distribution [Lemma 
3.1, Shapiro (1985)]. By Bayes' theorem, the distribution of LR* follows 
a mixture chi-square distribution with mixing proportions Pr(Y G V^i Y ) 

(£ = l,...,2 3 )and E£iPr(YG^, Y ) = l. 

The calculation of the mixture proportions follows Plackett (1954). Specif- 
ically, when Y G iphyi LR* ~ x§, and the corresponding mixture proportion 

w 3 = Pr(Y G V^| Y ) = 4^[2vr - cos _1 pi2 - cos -1 ^ - cos -1 ^]- For cate- 
gory (2), LR* ~ xi f° r Y G ipyvy, £ = 2, 3, 4, with the corresponding mixture 
probability calculated by W2 = X^j=2^ >r (Y G V£|y) = 17 P 71 " ~~ cos ~ 1 Pi2|3 ~~ 
cos" 1 Pi3| 2 - cos" 1 p 2 3|i]- Correspondingly, LR* ~ Xi for Y G V*| Y , * = 5,6, 7, 
with the relevant mixture probability evaluated as w\ = Ylj=5 P r (Y G ^,i Y ) = 
2 — W3 in category (3). For the last category, LR* ~ Xo for Y G ip^iy with the 
mixture probability wo = Pr(Y G ^| Y ) = 1 ~~ Note /9 a {, is the correlation 
between the terms a and b calculated from the Fisher information matrix, 
and p nh \ r = j- — ^ a s7,o^ Phc } ttf> - For more details of the derivation, the readers 
are referred to Li and Cui (2009b). 
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SUPPLEMENTARY MATERIAL 

Simulation and real data analysis: (DOI: 10.1214/09-AOAS323SUPP; .zip). 
Details for simulation are included in the supplemental file. We also analyze 
the data with a Mendelian model. A comparison of results with both im- 
printing and Mendelian models is summarized in the supplemental file. 
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